47 research outputs found
Tupleware: Redefining Modern Analytics
There is a fundamental discrepancy between the targeted and actual users of
current analytics frameworks. Most systems are designed for the data and
infrastructure of the Googles and Facebooks of the world---petabytes of data
distributed across large cloud deployments consisting of thousands of cheap
commodity machines. Yet, the vast majority of users operate clusters ranging
from a few to a few dozen nodes, analyze relatively small datasets of up to a
few terabytes, and perform primarily compute-intensive operations. Targeting
these users fundamentally changes the way we should build analytics systems.
This paper describes the design of Tupleware, a new system specifically aimed
at the challenges faced by the typical user. Tupleware's architecture brings
together ideas from the database, compiler, and programming languages
communities to create a powerful end-to-end solution for data analysis. We
propose novel techniques that consider the data, computations, and hardware
together to achieve maximum performance on a case-by-case basis. Our
experimental evaluation quantifies the impact of our novel techniques and shows
orders of magnitude performance improvement over alternative systems
Partitioning vs. Replication for Token-Based Commodity
The proliferation of e-commerce has enabled a new set of applications that
allow globally distributed
purchasing of commodities such as books, CDs, travel tickets, etc., over the
Internet. These commodities
can be represented on line by tokens, which can be distributed among servers to
enhance the performance
and availability of such applications. There are two main approaches for
distributing such tokens ?
replication and partitioning. Token replication requires expensive distributed
synchronization protocols to
provide data consistency, and is subject to both high latency and blocking in
case of network partitions. On
the other hand, token partitioning allows many transactions to execute locally
without any global
synchronization, which results in low latency and immunity against network
partitions.
In this paper, we examine the Data-Value Partitioning (DVP) approach to
token-based commodity
distribution. We propose novel DVP strategies that vary in the way they
redistribute tokens among the
servers of the system. Using a detailed simulation model and real Internet
message traces, we investigate
the performance of our DVP strategies by comparing them against a previously
proposed scheme,
Generalized Site Escrow (GSE), which is based on replication and escrow
transactions. Our experiments
demonstrate that, for the types of applications and environment we address,
replication-based approaches
are neither necessary nor desirable, as they inherently require quorum
synchronization to maintain
consistency. We show that DVP, primarily due to its ability to provide high
server autonomy, performs
favorably in all cases studied.
(Also cross-referenced as UMIACS-TR-2000-6
A Security Infrastructure for Mobile Transactional Systems
In this paper, we present an infrastructure for providing secure transactional
replication support for peer-to-peer, decentralized databases. We first
describe how to effectively provide protection against external threats,
malicious actions by servers not authorized to access data, using conventional
cryp-tography-based mechanisms. We then classify and present algorithms that
provide protection against internal threats, malicious actions by authenticated
servers that misrepresent protocol-specific infor-mation. Our approach to
handling internal threats uses both cryptographic techniques and modifica-tions
to the update commit criteria. The techniques we propose are unique in that
they not only enable a tradeoff between performance and the degree of tolerance
to malicious servers, but also allow for indi-vidual servers to support
non-uniform degrees of tolerance without adversely affecting the performance of
the rest of the system.
We investigate the cost of our security mechanisms in the context of Deno: a
prototype object replica-tion system designed for use in mobile and
weakly-connected environments. Experimental results reveal that protecting
against internal threats comes at a cost, but the marginal cost for protecting
against larger cliques of malicious insiders is generally low. Furthermore,
comparison with a decentralized Read-One Write-All protocol shows that our
approach performs significantly better under various workloads.
(Also cross-referenced as UMIACS-TR-2000-59
S-Store: Streaming Meets Transaction Processing
Stream processing addresses the needs of real-time applications. Transaction
processing addresses the coordination and safety of short atomic computations.
Heretofore, these two modes of operation existed in separate, stove-piped
systems. In this work, we attempt to fuse the two computational paradigms in a
single system called S-Store. In this way, S-Store can simultaneously
accommodate OLTP and streaming applications. We present a simple transaction
model for streams that integrates seamlessly with a traditional OLTP system. We
chose to build S-Store as an extension of H-Store, an open-source, in-memory,
distributed OLTP database system. By implementing S-Store in this way, we can
make use of the transaction processing facilities that H-Store already
supports, and we can concentrate on the additional implementation features that
are needed to support streaming. Similar implementations could be done using
other main-memory OLTP platforms. We show that we can actually achieve higher
throughput for streaming workloads in S-Store than an equivalent deployment in
H-Store alone. We also show how this can be achieved within H-Store with the
addition of a modest amount of new functionality. Furthermore, we compare
S-Store to two state-of-the-art streaming systems, Spark Streaming and Storm,
and show how S-Store matches and sometimes exceeds their performance while
providing stronger transactional guarantees
An End-to-end Neural Natural Language Interface for Databases
The ability to extract insights from new data sets is critical for decision
making. Visual interactive tools play an important role in data exploration
since they provide non-technical users with an effective way to visually
compose queries and comprehend the results. Natural language has recently
gained traction as an alternative query interface to databases with the
potential to enable non-expert users to formulate complex questions and
information needs efficiently and effectively. However, understanding natural
language questions and translating them accurately to SQL is a challenging
task, and thus Natural Language Interfaces for Databases (NLIDBs) have not yet
made their way into practical tools and commercial products.
In this paper, we present DBPal, a novel data exploration tool with a natural
language interface. DBPal leverages recent advances in deep models to make
query understanding more robust in the following ways: First, DBPal uses a deep
model to translate natural language statements to SQL, making the translation
process more robust to paraphrasing and other linguistic variations. Second, to
support the users in phrasing questions without knowing the database schema and
the query features, DBPal provides a learned auto-completion model that
suggests partial query extensions to users during query formulation and thus
helps to write complex queries
S-Store: a streaming NewSQL system for big velocity applications
First-generation streaming systems did not pay much attention to state management via ACID transactions (e.g., [3, 4]). S-Store is a data management system that combines OLTP transactions with stream processing. To create S-Store, we begin with H-Store, a main-memory transaction processing engine, and add primitives to support streaming. This includes triggers and transaction workflows to implement push-based processing, windows to provide a way to bound the computation, and tables with hidden state to implement scoping for proper isolation. This demo explores the benefits of this approach by showing how a naïve implementation of our benchmarks using only H-Store can yield incorrect results. We also show that by exploiting push-based semantics and our implementation of triggers, we can achieve significant improvement in transaction throughput. We demo two modern applications: (i) leaderboard maintenance for a version of "American Idol", and (ii) a city-scale bicycle rental scenario
Light-Weight Currency Management Mechanisms in Deno
This paper discusses the currency management mechanisms used in Deno, a replicated-object storage syste